Introduction to Data Science in Python

Lecturer: Hillary Green-Lerman


1 Course Description

Begin your journey into Data Science! Even if you’ve never written a line of code in your life, you’ll be able to follow this course and witness the power of Python to perform Data Science. You’ll use data to solve the mystery of Bayes, the kidnapped Golden Retriever, and along the way you’ll become familiar with basic Python syntax and popular Data Science modules like matplotlib (for charts and graphs) and pandas (for tabular data).

2 Getting Started in Python

Welcome to the wonderful world of Data Analysis in Python! In this chapter, you’ll learn the basics of Python syntax, load your first Python modules, and use functions to get a suspect list for the kidnapping of Bayes, DataCamp’s prize-winning Golden Retriever.

2.1 Lecture: Dive into Python

2.2 Importing Python Modules

Modules (sometimes called packages or libraries) help group together related sets of tools in Python. Below are sample imports of modules that are frequently used by Data Scientists:

  1. statsmodels: used in machine learning; usually aliased as sm;
  2. seaborn: a visualization library; usually aliased as sns;
  3. numpy: performs math operations; usually aliased as np.

Note that each module has a standard alias, which allows you to access the tools inside of the module without typing as many characters. For example, aliasing lets us shorten seaborn.scatterplot() to sns.scatterplot().

import statsmodels as sm
import seaborn as sns
import numpy as np

Great job! You’ve learned to import three important machine learning modules!

2.3 Lecture: Creating Variables

2.4 Creating Numbers & Strings

Before we start looking for Bayes’ kidnapper, we need to fill out a Missing Puppy Report with details of the case. Each piece of information will be stored as a variable.

We define a variable using an equals sign (\(=\)). For instance,

# Bayes' favorite toy
favorite_toy = "Mr. Squeaky"
type(favorite_toy)
## <class 'str'>
# Bayes' owner
owner = 'DataCamp'
owner
## 'DataCamp'
# Bayes' height
height = 24
print('height  || ', height, ' || ', type(height))
## height  ||  24  ||  <class 'int'>
# Bayes' age
bayes_age = 4.0
print('bayes_age  || ', bayes_age, ' || ', type(bayes_age))
## bayes_age  ||  4.0  ||  <class 'float'>

Notes: it’s easy to make errors when you’re trying to type strings quickly.

  1. Don’t forget to use quotes! Without quotes, you’ll get a name error.
owner = DataCamp
  1. Use the same type of quotation mark. If you start with a single quote, and end with a double quote, you’ll get a syntax error.
fur_color = "blonde'

2.5 Lecture: Fun with Functions

2.6 Load a DataFrame

A ransom note was left at the scene of Bayes’ kidnapping. Eventually, we’ll want to analyze the frequency with which each letter occurs in the note, to help us identify the kidnapper. For now, we just need to load the data from ransom.csv into Python. The data can be found here.

We’ll load the data into a DataFrame, a special data type from the pandas module. It represents spreadsheet-like data (something with rows and columns).

We can create a DataFrame from a CSV (comma-separated value) file by using the function pd.read_csv().

# Import pandas
import pandas as pd

# Load the 'ransom.csv' into a DataFrame
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Data%20Science%20in%20Python/ransom.csv'
ransom = pd.read_csv(url)

# Display DataFrame
ransom
##     letter_index letter  frequency
## 0              1      A       7.38
## 1              2      B       1.09
## 2              3      C       2.46
## 3              4      D       4.10
## 4              5      E      12.84
## 5              6      F       1.37
## 6              7      G       1.09
## 7              8      H       3.55
## 8              9      I       7.65
## 9             10      J       0.00
## 10            11      K       3.01
## 11            12      L       3.28
## 12            13      M       2.46
## 13            14      N       7.38
## 14            15      O       6.83
## 15            16      P       7.65
## 16            17      Q       0.00
## 17            18      R       4.92
## 18            19      S       4.10
## 19            20      T       6.28
## 20            21      U       4.37
## 21            22      V       1.09
## 22            23      W       2.46
## 23            24      X       0.00
## 24            25      Y       4.64
## 25            26      Z       0.00

Great job! You now have data that will eventually help you find Bayes’ kidnapper!

3 Loading Data in Pandas

In this chapter, you’ll learn a powerful Python libary: pandas. pandas lets you read, modify, and search tabular datasets (like spreadsheets and database tables). You’ll examine credit card records for the suspects and see if any of them made suspicious purchases.

3.1 Lecture: What is pandas?

3.2 Loading a DataFrame

We’re still working hard to solve the kidnapping of Bayes, the Golden Retriever. Assume that we have narrowed the list of suspects to:

  • Fred Frequentist;
  • Ronald Aylmer Fisher;
  • Gertrude Cox;
  • Kirstine Smith.

We’ve obtained credit card records for all four suspects. Perhaps some of them made suspicious purchases before the kidnapping?

The records are in a CSV called “credit_records.csv”. The data can be found here.

# Import pandas under the alias pd
import pandas as pd

# Load the CSV "credit_records.csv"
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Data%20Science%20in%20Python/credit_records.csv'
credit_records = pd.read_csv(url)

# Display the first five rows of credit_records using the .head() method
credit_records.head()
##             suspect         location              date         item  price
## 0    Kirstine Smith   Groceries R Us   January 6, 2018     broccoli   1.25
## 1      Gertrude Cox  Petroleum Plaza   January 6, 2018  fizzy drink   1.90
## 2  Fred Frequentist   Groceries R Us   January 6, 2018     broccoli   1.25
## 3      Gertrude Cox   Groceries R Us  January 12, 2018     broccoli   1.25
## 4    Kirstine Smith    Clothing Club   January 9, 2018        shirt  14.25

What do you notice about the credit records?

3.3 Inspecting a DataFrame

We’ve loaded the credit card records of our four suspects into a DataFrame called credit_records. Let’s learn more about the structure of this DataFrame. How many rows are in credit_records?

credit_records.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 104 entries, 0 to 103
## Data columns (total 5 columns):
##  #   Column    Non-Null Count  Dtype  
## ---  ------    --------------  -----  
##  0   suspect   104 non-null    object 
##  1   location  104 non-null    object 
##  2   date      104 non-null    object 
##  3   item      104 non-null    object 
##  4   price     104 non-null    float64
## dtypes: float64(1), object(4)
## memory usage: 4.2+ KB

3.4 Lecture: Selecting columns

3.5 Two methods for selecting columns

Once again, we’ve loaded the credit card records of our four suspects into a DataFrame called credit_records. Let’s examine the items that they’ve purchased.

# Select the column item from credit_records
# Use brackets and string notation
credit_records["item"]
## 0         broccoli
## 1      fizzy drink
## 2         broccoli
## 3         broccoli
## 4            shirt
##           ...     
## 99           shirt
## 100          pants
## 101          dress
## 102         burger
## 103      cucumbers
## Name: item, Length: 104, dtype: object
# Select the column item from credit_records
# Use dot notation
credit_records.item
## 0         broccoli
## 1      fizzy drink
## 2         broccoli
## 3         broccoli
## 4            shirt
##           ...     
## 99           shirt
## 100          pants
## 101          dress
## 102         burger
## 103      cucumbers
## Name: item, Length: 104, dtype: object

Great job! Notice that both notations returned the same output.

Another junior detective is examining a DataFrame of Missing Puppy Reports. The data can be found here.

url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Data%20Science%20in%20Python/mpr.csv'
mpr = pd.read_csv(url)

# Use info() to inspect mpr
print(mpr.info())
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 6 entries, 0 to 5
## Data columns (total 5 columns):
##  #   Column      Non-Null Count  Dtype 
## ---  ------      --------------  ----- 
##  0   Dog Name    6 non-null      object
##  1   Owner Name  5 non-null      object
##  2   Dog Breed   6 non-null      object
##  3   Status      6 non-null      object
##  4   Age         6 non-null      int64 
## dtypes: int64(1), object(4)
## memory usage: 368.0+ bytes
## None
# Select column "Dog Name" from mpr
name = mpr["Dog Name"]

# Select column "Status" from mpr
is_missing = mpr["Status"]

# Display the columns
print(name, is_missing)
## 0      Bayes
## 1    Sigmoid
## 2     Sparky
## 3    Theorem
## 4        Ned
## 5      Benny
## Name: Dog Name, dtype: object 0    Still Missing
## 1    Still Missing
## 2            Found
## 3            Found
## 4    Still Missing
## 5            Found
## Name: Status, dtype: object

3.6 Lecture: Selecting rows with logic

3.7 Logical testing

Let’s practice writing logical statements and displaying the output.

Recall that we use the following operators:

  • \(==\) tests that two values are equal;
  • \(!=\) tests that two values are not equal;
  • \(>\) and \(<\) test that greater than or less than, respectively;
  • \(>=\) and \(<=\) test greater than or equal to or less than or equal to, respectively.

The variable height_inches represents the height of a suspect. Is height_inches greater than 70 inches?

height_inches = 65
height_inches > 70
## False

The variable plate1 represents a license plate number of a suspect. Is it equal to FRQ123?

plate1 = 'FRQ123'
plate1 == "FRQ123"
## True

The variable fur_color represents the color of Bayes’ fur. Is fur_color equal to “brown”?

fur_color = 'blonde'
fur_color != "brown"
## True

Great job! Let’s use these logical statements to select some rows!

3.8 Selecting missing puppies

Let’s return to our DataFrame of missing puppies, which is loaded as mpr. Let’s select a few different rows to learn more about the other missing dogs.

# Select the dogs where Age is greater than 2
mpr[mpr.Age > 2]
##   Dog Name             Owner Name       Dog Breed Status  Age
## 2   Sparky             Dr. Apache   Border Collie  Found    3
## 3  Theorem  Joseph-Louis Lagrange  French Bulldog  Found    4
## 5    Benny   Hillary Green-Lerman          Poodle  Found    3
# Select the dogs whose Status is equal to Still Missing
mpr[mpr.Status == "Still Missing"]
##   Dog Name    Owner Name         Dog Breed         Status  Age
## 0    Bayes      DataCamp  Golden Retriever  Still Missing    1
## 1  Sigmoid           NaN         Dachshund  Still Missing    2
## 4      Ned  Tim Oliphant          Shih Tzu  Still Missing    2
# Select all dogs whose Dog Breed is not equal to Poodle
mpr[mpr["Dog Breed"] != "Poodle"]
##   Dog Name             Owner Name         Dog Breed         Status  Age
## 0    Bayes               DataCamp  Golden Retriever  Still Missing    1
## 1  Sigmoid                    NaN         Dachshund  Still Missing    2
## 2   Sparky             Dr. Apache     Border Collie          Found    3
## 3  Theorem  Joseph-Louis Lagrange    French Bulldog          Found    4
## 4      Ned           Tim Oliphant          Shih Tzu  Still Missing    2

Great job! Now that you’re familiar with selecting rows, let’s examine the credit report data!

3.9 Narrowing the list of suspects

Recall the list of suspects that might have kidnapped Bayes:

  • Fred Frequentist;
  • Ronald Aylmer Fisher;
  • Gertrude Cox;
  • Kirstine Smith.

We’d like to narrow this list down, so we obtained credit card records for each suspect. We’d like to know if any of them recently purchased dog treats to use in the kidnapping. If they did, they would have visited ‘Pet Paradise’.

The credit records have been loaded into a DataFrame called credit_records.

# Select purchases from 'Pet Paradise'
credit_records[credit_records.location == 'Pet Paradise']
## Empty DataFrame
## Columns: [suspect, location, date, item, price]
## Index: []

Both Fred Frequentist and Gertrude Cox purchased dog treats. Perhaps they were trying to lure Bayes into their trap?

4 Plotting Data with matplotlib

Get ready to visualize your data! You’ll create line plots with another Python module: Matplotlib. Using line plots, you’ll analyze the letter frequencies from the ransom note and several handwriting samples to determine the kidnapper.

4.1 Lecture: Creating line plots

4.2 Working hard

Several police officers have been working hard to help us solve the mystery of Bayes, the kidnapped Golden Retriever. Their commanding officer wants to know exactly how hard each officer has been working on this case. Officer Deshaun has created a DataFrame called deshaun to track the amount of time he spent working on this case. The DataFrame contains two columns:

  • day_of_week: a string representing the day of the week;
  • hours_worked: the number of hours that a particular officer worked on the Bayes case.

The data can be found here.

# From matplotlib, import pyplot under the alias plt
from matplotlib import pyplot as plt

# Plot Officer Deshaun's hours_worked vs. day_of_week
plt.plot(deshaun.day_of_week, deshaun.hours_worked)

# Display Deshaun's plot
plt.show()

Great job! It seems like Deshaun works a lot on Monday and Friday, but not so much on Wednesday. In the next exercise, you’ll compare Deshaun’s work to his coworkers’ hours.

4.3 Or hardly working?

Two other officers have been working with Deshaun to help find Bayes. Their names are Officer Mengfei and Officer Aditya. Deshaun used their time cards to create two more DataFrames: mengfei and aditya. Let’s plot all three lines together to see who was working hard each day.

# Plot Officer Deshaun's hours_worked vs. day_of_week
plt.plot(deshaun.day_of_week, deshaun.hours_worked)

# Plot Officer Aditya's hours_worked vs. day_of_week
plt.plot(aditya.day_of_week, aditya.hours_worked)

# Plot Officer Mengfei's hours_worked vs. day_of_week
plt.plot(mengfei.day_of_week, mengfei.hours_worked)

# Display all three line plots
plt.show()

The orange line has no hours worked on Thursday or Friday. But who does the orange represent? Let’s learn how to add a legend to help and continue to work on the mystery of who kidnapped Bayes.

4.4 Lecture: Adding text to plots

4.5 Adding a legend

Officers Deshaun, Mengfei, and Aditya have all been working with you to solve the kidnapping of Bayes. Their supervisor wants to know how much time each officer has spent working on the case.

Deshaun created a plot of data from the DataFrames deshaun, mengfei, and aditya previously. Now he wants to add a legend to distinguish the three lines.

# Officer Deshaun
plt.plot(deshaun.day_of_week, deshaun.hours_worked, label = 'Deshaun')

# Add a label to Aditya's plot
plt.plot(aditya.day_of_week, aditya.hours_worked, label = 'Aditya')

# Add a label to Mengfei's plot
plt.plot(mengfei.day_of_week, mengfei.hours_worked, label = 'Mengfei')

# Add a command to make the legend display
plt.legend()

# Display plot
plt.show()

Great job! The Mengfei’s line has no hours worked on Monday and Tuesday. Let’s add some labels to this graph so that we can share it with Deshaun’s supervisor.

4.6 Adding labels

If we give a chart with no labels to Officer Deshaun’s supervisor, she won’t know what the lines represent.

We need to add labels to Officer Deshaun’s plot of hours worked.

# Lines
plt.plot(deshaun.day_of_week, deshaun.hours_worked, label = 'Deshaun')
plt.plot(aditya.day_of_week, aditya.hours_worked, label = 'Aditya')
plt.plot(mengfei.day_of_week, mengfei.hours_worked, label = 'Mengfei')

# Add a title
plt.title('Hours Worked per Days of Week')

# Add y-axis label
plt.ylabel('Hours Worked')

# Legend
plt.legend()

# Display plot
plt.show()

4.7 Adding floating text

Officer Deshaun is examining the number of hours that he worked over the past six months. The data can be found here.

url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Data%20Science%20in%20Python/six_months.csv'
six_months = pd.read_csv(url)
six_months
##   month  hours_worked
## 0   Jan           160
## 1   Feb           185
## 2   Mar           182
## 3   Apr           195
## 4   Jun            50

The number for June is low because he only had data for the first week. Let’s help Deshaun by adding an annotation to the graph to explain this.

# Create plot
plt.plot(six_months.month, six_months.hours_worked)

# Add annotation "Missing June data" at (2.5, 80)
plt.text(2.5, 80, "Missing June data")

# Display graph
plt.show()

Great job! The graph would have been confusing without that extra information.

4.8 Lecture: Styling graphs

4.9 Tracking crime statistics

Sergeant Laura wants to do some background research to help her better understand the cultural context for Bayes’ kidnapping. She has plotted Burglary rates in three U.S. cities using data from the Uniform Crime Reporting Statistics. The data can be found here.

Remember:

  • You can change linestyle to dotted (‘:’), dashed(‘–’), or no line (’’);
  • You can change the marker to circle (‘o’), diamond(‘d’), or square (‘s’).
plt.plot(data["Year"], data["Phoenix Police Dept"],
         label = "Phoenix", color = "DarkCyan")
plt.plot(data["Year"], data["Los Angeles Police Dept"],
         label = "Los Angeles", linestyle = ':')
plt.plot(data["Year"], data["Philadelphia Police Dept"],
         label = "Philadelphia", marker = 's')
plt.legend()
plt.show()

Great job! This was a lot of work. Perhaps we can make this easier by setting a global style.

4.10 Playing with styles

Changing the plotting style is a fast way to change the entire look of your plot without having to update individual colors or line styles. Some popular styles include:

  • ‘fivethirtyeight’ - Based on the color scheme of the popular website;
  • ‘grayscale’ - Great for when you don’t have a color printer!
  • ‘seaborn’ - Based on another Python visualization library;
  • ‘classic’ - The default color scheme for matplotlib.
# Change the style to fivethirtyeight
plt.style.use('fivethirtyeight')

# Plot lines
plt.plot(data["Year"], data["Phoenix Police Dept"], label = "Phoenix")
plt.plot(data["Year"], data["Los Angeles Police Dept"], label = "Los Angeles")
plt.plot(data["Year"], data["Philadelphia Police Dept"], label = "Philadelphia")

# Add a legend
plt.legend()

# Display the plot
plt.show()

# Change the style to ggplot
plt.style.use('ggplot')

# Plot lines
plt.plot(data["Year"], data["Phoenix Police Dept"], label = "Phoenix")
plt.plot(data["Year"], data["Los Angeles Police Dept"], label = "Los Angeles")
plt.plot(data["Year"], data["Philadelphia Police Dept"], label = "Philadelphia")

# Add a legend
plt.legend()

# Display the plot
plt.show()

# View all available styles
plt.style.available
## ['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-v0_8', 'seaborn-v0_8-bright', 'seaborn-v0_8-colorblind', 'seaborn-v0_8-dark', 'seaborn-v0_8-dark-palette', 'seaborn-v0_8-darkgrid', 'seaborn-v0_8-deep', 'seaborn-v0_8-muted', 'seaborn-v0_8-notebook', 'seaborn-v0_8-paper', 'seaborn-v0_8-pastel', 'seaborn-v0_8-poster', 'seaborn-v0_8-talk', 'seaborn-v0_8-ticks', 'seaborn-v0_8-white', 'seaborn-v0_8-whitegrid', 'tableau-colorblind10']

Great job! With this background information, you’re ready to finally find the kidnapper.

4.11 Identifying Bayes’ kidnapper

We’ve narrowed the possible kidnappers down to two suspects:

  • Fred Frequentist;
  • Gertrude Cox.

The kidnapper left a long ransom note containing several unusual phrases. Let’s use a line plot to compare the frequency of letters in the ransom note to samples from the two main suspects.

Two more DataFrames have been loaded, beside ransom:

  • suspect1 contains the letter frequencies for the sample from Fred Frequentist;
  • suspect2 contains the letter frequencies for the sample from Gertrude Cox.

Each DataFrame contain two columns letter and frequency.

# Plot each line
plt.plot(ransom.letter, ransom.frequency,
         label = 'Ransom', linestyle = ':', color = 'gray')
plt.plot(suspect1.letter, suspect1.frequency, label = 'Fred Frequentist')
plt.plot(suspect2.letter, suspect2.frequency, label = 'Gertrude Cox')

# Add x- and y-labels
plt.xlabel("Letter")
plt.ylabel("Frequency")

# Add a legend
plt.legend()

# Display plot
plt.show()

It looks like Fred Frequentist is the kidnapper. Both the ransom and Fred have low frequencies of H and high frequency of P.

5 Different Types of Plots

In this final chapter, you’ll learn how to create three new plot types: scatter plots, bar plots, and histograms. You’ll use these tools to locate where the kidnapper is hiding and rescue Bayes, the Golden Retriever.

5.1 Lecture: Making a scatter plot

5.2 Charting cellphone data

We know that Freddy Frequentist is the one who kidnapped Bayes the Golden Retriever. Now we need to learn where he is hiding.

Our friends at the police station have acquired cell phone data, which gives some of Freddie’s locations over the past three weeks. It’s stored in the DataFrame cellphone. The x-coordinates are in the column ‘x’ and the y-coordinates are in the column ‘y’.

# Explore the data
cellphone.head()
##            x          y
## 0  28.136519  39.358650
## 1  44.642131  58.214270
## 2  34.921629  42.039109
## 3  31.034296  38.283153
## 4  36.419871  65.971441
# Create a scatterplot
plt.scatter(cellphone.x, cellphone.y)

# Add labels
plt.ylabel('Latitude')
plt.xlabel('Longitude')

# Display the plot
plt.show()

Great job! Next, we’ll use keyword arguments to make this plot a little bit prettier.

5.3 Modifying a scatterplot

Previously, we created a scatter plot to show Freddy Frequentist’s cell phone data.

Now, we will do some magic so that the plot will appear over a map of our town. If we just plot the data as we did before, we won’t be able to see the map or pick out the areas with the most points. We can fix this by changing the colors, markers, and transparency of the scatter plot.

import PIL
import urllib
url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Data%20Science%20in%20Python/town_map.png'
town_map = np.array(PIL.Image.open(urllib.request.urlopen(url)))
plt.scatter(cellphone.x, cellphone.y, color = 'red',
            marker = 's', alpha = 0.1)
plt.imshow(town_map, 
           extent = [min(cellphone.x), max(cellphone.x),
                     min(cellphone.y), max(cellphone.y)])
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

Great job! Freddy has been spending a lot of time in Blue Meadows Park, Happy Mountain Trailhead, and Shady Groves Campsite.

5.4 Lecture: Making a bar chart

5.5 Build a simple bar chart

Officer Deshaun wants to plot the average number of hours worked per week for him and his coworkers. He has stored the hours worked in a DataFrame called hours.

url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Data%20Science%20in%20Python/hours.csv'
hours = pd.read_csv(url)
hours
##    officer  avg_hours_worked  std_hours_worked
## 0  Deshaun                45                 3
## 1  Mengfei                33                 9
## 2   Aditya                42                 5
# Create a bar plot from the DataFrame hours
plt.bar(hours.officer, hours.avg_hours_worked,
        # Add error bars
        yerr = hours.std_hours_worked)

# Display the plot
plt.show()

Excellent! Let’s keep investigating and see how each officer was spending his or her time.

5.6 Where did the time go?

Officer Deshaun wants to compare the hours spent on field work and desk work between him and his colleagues. In this DataFrame, he has split out the average hours worked per week into desk_work and field_work.

url = 'https://raw.githubusercontent.com/QuanNguyenIU/QuanNguyenIU.github.io/main/DataCamp/Python/Intro.%20to%20Data%20Science%20in%20Python/hours2.csv'
hours = pd.read_csv(url)

# Plot the number of hours spent on desk work
plt.bar(hours.officer, hours.desk_work, label = 'Desk Work')

# Plot the hours spent on field work on top of desk work
plt.bar(hours.officer, hours.field_work,
        bottom = hours.desk_work,label = "Field Work")

# Add a legend
plt.legend()

# Display the plot
plt.show()

Wonderful! It looks like Officer Aditya spent the most amount of time on field work.

5.7 Lecture: Making a histogram

5.8 Modifying histograms

Let’s explore how changes to keyword parameters in a histogram can change the output. Recall that:

range sets the minimum and maximum datapoints that we will include in our histogram. bins sets the number of points in our histogram. We’ll be exploring the weights of various puppies from the DataFrame puppies.

5.9 Heroes with histograms

6 Final Words

Congratulations on completing the course! More courses, tracks and instructions can be found here. Happy learning!